Introduction to Time Series Analysis - 02

This note is for course MATH 545 at McGill University.

Lecture 4 - Lecture 6

We recommend an R package named “forecast”.

First order Autoregressive process AR(1)

Assume $\{X_t\}$ is a sequence of random variables that is stationary, satisfying $X_t = \phi X_{t-1} + Z_t$ , for $t=0, \pm 1, \pm 2, ...$ , where $\{Z_t\}$ is a White Noise process of $WN(0, \sigma^2)$ , and $\{Z_t\}$ is uncorrelated with $\{X_t\}$ for $\forall s<t$ , and $\phi$ is a real-valued constant.

(Graphical representation will be added later)

We have $E(X_t) = \phi E(X_{t-1}) + E(Z_t)=\phi \mu_X + 0$ .

By assuming $\{X_t\}$ is stationary, we need $\mu_X = 0$ (or $\phi=1$ or $\phi=0$ )

By construction, $(X_{t-h})X_t = (X_{t-h})(X_{t-1} + Z_t)$

Take expectation, $E(X_{t-h}X_t) = E(X_{t-h}(X_{t-1} + Z_t))$

$E(X_{t-h}X_t)=\phi E(X_{t-h}X_{t-1})+E(X_{t-h}Z_{t})=\phi E(X_{t-h}X_{t-1})$

Then we have $\gamma_X(h) = \phi \gamma_X(h-1)=\phi[\phi \gamma_X(h-2)]=...=\phi^h \gamma_X(0)$ , and $\rho_X(h) = \frac{\gamma_X(h)}{\gamma_X(0)}=\phi^h$ .

By symmetry and stationary, we have $\rho_X(h)=\phi^{|h|}$

$\gamma_X(0)=Cov(X_t, X_t) \\= E((\phi X_{t-1} + Z_t)(\phi X_{t-1} + Z_t))\\=\phi^2 E(X_{t-1}^2) + E(Z_t^2) \\= \phi^2 \gamma_X(0) + \sigma^2$ (as $X_{t-1}$ and $Z_t$ are not correlated by definition)

Therefore, we have $\gamma_X(0) = \frac{\sigma^2}{1-\phi^2}$ with $|\phi|<1$ .

Estimating Autocorrelation

Let $X_1, ..., X_n$ be observed values for a stationary sequence, and sample mean $\bar{X}=\frac{1}{n}\sum^n_{i=1}X_i$ .

We have covariance $Cov(V,W)=E[(V-E(V))(W-E(W))]$ , and the unbiased estimator of covariance as $\hat{Cov}(V,W)=\frac{\sum_{i=1}^n(V_i-\bar{V})(W_i-\bar{W})}{n-1}$

$\gamma_X(h)=Cov(X_{t+h}, X_t)$

$\hat{\gamma}_X(h)=\frac{1}{n}\sum_{t=1}^{n-|h|}(X_{t+|h|}-\bar{X})(X_t-\bar{X})$ for $-n<h<n$

The sample autocorrelation $\hat{\rho}(h)=\frac{\hat{\gamma}_X(h)}{\gamma_X(0)}$ , where $\gamma_X(0)=\frac{1}{n}\sum^n_{t=1}(X_t-\bar{X})^2$ .

Classical Decomposition Model

$X_t = m_t + S_t + Y_t$

$m_t$ here shows “trend”, $S_t$ shows seasonal patterns, and $Y_t$ is random “noise” component (so far we have 4 choices of noise: iid, white noise, MA(1), and AR(1))

We can remove $m_t$ and $S_t$ to estimate $Y_t$ , and we have two ways:

estimate trend/seasonal using a “model” (filter)
differencing $\{X_t\}$ to estimate trend and seasonality (filter)

Estimate Trend

$X_t=m_t+Y_t, \quad t=1, ..., n$ , and $E(Y_t)=0$

We can use Nonparametric methods, which is flexible and with fewer assumptions, but is subjective

Finite Moving Average Filter (to capture local trend)

$W_t=\frac{1}{2q+1}\sum_{j=-q}^{q}X_{t-j} \\=\frac{1}{2q+1}\sum_{j=-q}^{q}(m_{t-j}+Y_{t-j}) \\=\frac{1}{2q+1}[\sum_{j=-q}^{q}m_{t-j}] + \frac{1}{2q+1}[\sum_{j=-q}^{q}Y_{t-j}] \\ \approx \frac{1}{2q+1}\sum_{j=-q}^{q}m_{t-j}$ where $q$ is a positive integer

Moving Average is a linear filter where $a_j=\begin{cases}\frac{1}{2q+1}, \text{for} |j| \leq q \\ 0, \text{otherwise}\end{cases}$

Our goal is $X_t-\hat{m}_t = \hat{Y}_t$

Exponential smoothing

$\hat{m}_t=\alpha X_t + (1-\alpha)\hat{m}_{t-1}$

For $t=1$ , we have $\hat{m}_1=X_1$ . For $t\geq 2$ , $\hat{m}_t=\sum_{j=0}^{t-2} \alpha(1-\alpha)^j X_{t-j} + (1-\alpha)^{t-1}X_1$ .

Parametric smoothing (linear, polynomial, basic function b-spline)
High-frequency smoothing using Fourier Series

Differencing（for trend)

We define the lag-1 difference as $\nabla X_t = X_t - X_{t-1} = (1-B)X_t$ , where $B$ is known as the backwards shift operator with $BX_t = X_{t-1}$

We can generalize $\nabla$ and $B$ to general lags by taking powers:

$B^j X_t = B^{j-1} (BX_t) = B^{j-1}X_{t-1} = ... = X_{t-j}$

$X_t - X_{t-j} = (1-B^j)X_t$

As $\nabla^jX_t = \nabla(\nabla^{j-1}X_t)$ , we have:

$\nabla^2X_t = \nabla(\nabla X_t) = \nabla((1-B)X_t) = (1_B)(1-B)X_t \\=(1-2B+B^2)X_t = X_t -2BX_t + B^2X_t \\=X_t-2X_{t-1}+X_{t-2}\\=(X_t-X_{t-1}) - (X_{t-1}-X_{t-2})$

Let $X_t = m_t+Y_t$ , where $m_t = a+bt$ , we have

$\nabla X_t = \nabla(m_t+Y_t) = \nabla m_t + \nabla Y_t \\=m_t-m_{t-1} + Y_t -Y_{t-1} = (a+bt) + (a+b(t-1)) + Y_t-Y_{t-1} \\=b+Y_t-Y_{t-1}$

Therefore we could say $\nabla X_t$ will be stationary if $Y_t-Y_{t-1}$ is stationary.

Estimate seasonal component

An example of $d=4$

$k=1$	$k=2$	$k=3$	$k=4$
$\tilde{x}_{1}$	$\tilde{x}_{2}$	$\tilde{x}_{3}$	$\tilde{x}_{4}$	$\rightarrow j=0$
$\tilde{x}_{5}$	$\tilde{x}_{6}$	$\tilde{x}_{7}$	$\tilde{x}_{8}$	$\rightarrow j=1$
$\tilde{x}_{9}$	$\tilde{x}_{10}$	$\tilde{x}_{11}$	$\tilde{x}_{12}$	$\rightarrow j=2$
$\downarrow$	$\downarrow$	$\downarrow$	$\downarrow$
$S_1$	$S_2$	$S_3$	$S_4$

$W_k=\sum_{j=1}^{t/d-1}(X_{k+jd} - \hat{m}_{k+jd})$

$\hat{S}_k = W_k - \frac{1}{d}\sum_{i=1}^d W_i$

Let $d_t = X_t - \hat{S}_t$ as deseasonal data, we can reestimate the trend from $d_t$ and $\tilde{m}_t$ by $\hat{Y}_t = X_t - \hat{S}_t - \tilde{m}_t$

Differencing (for seasonal)

$\nabla_d X_t = X_t - X_{t-d} = (1-B^d)X_t$

Apply this to $X_t=m_t+S_t+Y_t$ , we have:

$\nabla_d X_t = \nabla_d(m_t + S_t+Y_t) \\=(m_t - m_{t-d}) + (S_t - S_{t-d}) + (Y_t - Y_{t-d})$

Therefore $\tilde{X}_t=(m_t - m_{t-d}) + (Y_t - Y_{t-d})$

If $m_t=a+bt$ , $\tilde{X}_t = ((a+bt)-(a+b(t-d))) + (Y_t-Y_{t-d}) = bd+ (Y_t-Y_{t-d})$

If $Y_{t} \stackrel{\text { iid }}{\sim}\left(0, \sigma^{2}\right)$ , we have $\hat{p}(h) \stackrel{\text { · }}{\sim} N\left(0, \frac{1}{n}\right)$ . (No proof)

(Recall that $\hat{p}(h)=\frac{\hat{\gamma}(h)}{\hat{\gamma}(0)} = \frac{\sum_{i=1}^{n-|h|} (X_i-\bar{X})(X_{i+n}-\bar{X})/n}{\sum_{i=1}^n (X_i-\bar{X})^2/n}$ )